Modules and packages are the core of any large project, and the Python installation itself.
This chapter focuses on common programming techniques involving modules and packages,
such as how to organize packages, splitting large modules into multiple files,
and creating namespace packages. Recipes that allow you to customize the
operation of the import statement itself are also given.
You want to organize your code into a package consisting of a hierarchical collection of modules.
Making a package structure is simple. Just organize your code as you wish on the file-system and make sure that every directory defines an __init__.py file. For example:
graphics/
__init__.py
primitive/
__init__.py
line.py
fill.py
text.py
formats/
__init__.py
png.py
jpg.pyOnce you have done this, you should be able to perform various import statements, such as the following:
importgraphics.primitive.linefromgraphics.primitiveimportlineimportgraphics.formats.jpgasjpg
Defining a hierarchy of modules is as easy as making a directory
structure on the filesystem. The purpose of the __init__.py files
is to include optional initialization code that runs as different
levels of a package are encountered. For example, if you have the
statement import graphics, the file graphics/__init__.py will be imported and form the contents
of the graphics namespace. For an import such as import graphics.formats.jpg, the files graphics/__init__.py and graphics/formats/__init__.py will both be imported prior to the final import of the
graphics/formats/jpg.py file.
More often that not, it’s fine to just leave the __init__.py files empty. However, there are certain situations where they might include code. For example, an __init__.py file can be used to automatically load submodules like this:
# graphics/formats/__init__.pyfrom.importjpgfrom.importpng
For such a file, a user merely has to use a single import graphics.formats instead of a separate import for graphics.formats.jpg and graphics.formats.png.
Other common uses of __init__.py include consolidating definitions from multiple files into a single logical namespace, as is sometimes done when splitting modules. This is discussed in Recipe 10.4.
Astute programmers will notice that Python 3.3 still seems to perform package imports even if no __init__.py files are present. If you don’t define __init__.py, you actually create what’s known as a “namespace package,” which is described in Recipe 10.5. All things being equal, include the __init__.py files if you’re just starting out with the creation of a new package.
You want precise control over the symbols that are exported from a module or
package when a user uses the from module import * statement.
Define a variable __all__ in your module that explicitly lists the exported
names. For example:
# somemodule.pydefspam():passdefgrok():passblah=42# Only export 'spam' and 'grok'__all__=['spam','grok']
Although the use of from module import * is strongly discouraged, it
still sees frequent use in modules that define a large number of
names. If you don’t do anything, this form of import will export all
names that don’t start with an underscore. On the other hand,
if __all__ is defined, then only the names explicitly listed will be
exported.
If you define __all__ as an empty list, then nothing will be exported.
An AttributeError is raised on import if __all__ contains undefined
names.
You have code organized as a package and want to import a submodule from one of the other package submodules without hardcoding the package name into the import statement.
To import modules of a package from other modules in the same package,
use a package-relative import. For example, suppose you have a
package mypackage organized as follows on the filesystem:
mypackage/
__init__.py
A/
__init__.py
spam.py
grok.py
B/
__init__.py
bar.pyIf the module mypackage.A.spam wants to import the module grok located
in the same directory, it should include an import statement like this:
# mypackage/A/spam.pyfrom.importgrok
If the same module wants to import the module B.bar located in a different
directory, it can use an import statement like this:
# mypackage/A/spam.pyfrom..Bimportbar
Both of the import statements shown operate relative to the location of the spam.py file and do not include the top-level package name.
Inside packages, imports involving modules in the same package can either use fully specified absolute names or a relative imports using the syntax shown. For example:
# mypackage/A/spam.pyfrommypackage.Aimportgrok# OKfrom.importgrok# OKimportgrok# Error (not found)
The downside of using an absolute name, such as mypackage.A, is that
it hardcodes the top-level package name into your source code. This,
in turn, makes your code more brittle and hard to work with if you
ever want to reorganize it. For
example, if you ever changed the name of the package, you would have
to go through all of your files and fix the source code. Similarly,
hardcoded names make it difficult for someone else to move the
code around. For example, perhaps someone wants to install two
different versions of a package, differentiating them only by name.
If relative imports are used, it would all work fine, whereas everything
would break with absolute names.
The . and .. syntax on the import statement might look funny, but
think of it as specifying a directory name. . means look in the current
directory and ..B means look in the ../B directory. This syntax
only works with the from form of import. For example:
from.importgrok# OKimport.grok# ERROR
Although it looks like you could navigate the filesystem using a relative import, they are not allowed to escape the directory in which a package is defined. That is, combinations of dotted name patterns that would cause an import to occur from a non-package directory cause an error.
Finally, it should be noted that relative imports only work for modules that are located inside a proper package. In particular, they do not work inside simple modules located at the top level of scripts. They also won’t work if parts of a package are executed directly as a script. For example:
% python3 mypackage/A/spam.py # Relative imports fail
On the other hand, if you execute the preceding script using the -m option
to Python, the relative imports will work properly. For example:
% python3 -m mypackage.A.spam # Relative imports work
For more background on relative package imports, see PEP 328.
You have a module that you would like to split into multiple files. However, you would like to do it without breaking existing code by keeping the separate files unified as a single logical module.
A program module can be split into separate files by turning it into a package. Consider the following simple module:
# mymodule.pyclassA:defspam(self):('A.spam')classB(A):defbar(self):('B.bar')
Suppose you want to split mymodule.py into two files, one for each class definition. To do that, start by replacing the mymodule.py file with a directory called mymodule. In that directory, create the following files:
mymodule/
__init__.py
a.py
b.pyIn the a.py file, put this code:
# a.pyclassA:defspam(self):('A.spam')
In the b.py file, put this code:
# b.pyfrom.aimportAclassB(A):defbar(self):('B.bar')
Finally, in the __init__.py file, glue the two files together:
# __init__.pyfrom.aimportAfrom.bimportB
If you follow these steps, the resulting mymodule package will
appear to be a single logical module:
>>>importmymodule>>>a=mymodule.A()>>>a.spam()A.spam>>>b=mymodule.B()>>>b.bar()B.bar>>>
The primary concern in this recipe is a design question
of whether or not you want users to work with a lot of
small modules or just a single module. For example,
in a large code base, you could just break everything up
into separate files and make users use a lot of import statements
like this:
frommymodule.aimportAfrommymodule.bimportB...
This works, but it places more of a burden on the user to know where the different parts are located. Often, it’s just easier to unify things and allow a single import like this:
frommymoduleimportA,B
For this latter case, it’s most common to think of mymodule as being
one large source file. However, this recipe shows how to stitch
multiple files together into a single logical namespace. The key to
doing this is to create a package directory and to use the
__init__.py file to glue the parts together.
When a module gets split, you’ll need to pay careful attention to
cross-filename references. For instance, in this recipe, class B needs to access class A as a base class. A package-relative import
from .a import A is used to get it.
Package-relative imports are used throughout the recipe to avoid hardcoding the top-level module name into the source code. This makes it easier to rename the module or move it around elsewhere later (see Recipe 10.3).
One extension of this recipe involves the introduction of “lazy” imports. As shown, the __init__.py file imports all of the required subcomponents all at once. However, for a very large module, perhaps you only want to load components as they are needed. To do that, here is a slight variation of __init__.py:
# __init__.pydefA():from.aimportAreturnA()defB():from.bimportBreturnB()
In this version, classes A and B have been replaced by functions
that load the desired classes when they are first accessed. To a
user, it won’t look much different. For example:
>>>importmymodule>>>a=mymodule.A()>>>a.spam()A.spam>>>
The main downside of lazy loading is that inheritance and type checking might break. For example, you might have to change your code slightly:
ifisinstance(x,mymodule.A):# Error...ifisinstance(x,mymodule.a.A):# Ok...
For a real-world example of lazy loading, look at the source code for multiprocessing/__init__.py in the standard library.
You have a large base of code with parts possibly maintained and distributed by different people. Each part is organized as a directory of files, like a package. However, instead of having each part installed as a separated named package, you would like all of the parts to join together under a common package prefix.
Essentially, the problem here is that you would like to define a top-level Python package that serves as a namespace for a large collection of separately maintained subpackages. This problem often arises in large application frameworks where the framework developers want to encourage users to distribute plug-ins or add-on packages.
To unify separate directories under a common namespace, you organize the code just like a normal Python package, but you omit __init__.py files in the directories where the components are going to join together. To illustrate, suppose you have two different directories of Python code like this:
foo-package/
spam/
blah.py
bar-package/
spam/
grok.pyIn these directories, the name spam is being used as a common
namespace. Observe that there is no __init__.py file
in either directory.
Now watch what happens if you add both foo-package and bar-package to
the Python module path and try some imports:
>>>importsys>>>sys.path.extend(['foo-package','bar-package'])>>>importspam.blah>>>importspam.grok>>>
You’ll observe that, by magic, the two different package
directories merge together and you can import either spam.blah
or spam.grok. It just works.
The mechanism at work here is a feature known as a “namespace package.” Essentially, a namespace package is a special kind of package designed for merging different directories of code together under a common namespace, as shown. For large frameworks, this can be useful, since it allows parts of a framework to be broken up into separately installed downloads. It also enables people to easily make third-party add-ons and other extensions to such frameworks.
The key to making a namespace package is to make sure there are no
__init__.py files in the top-level directory that is to serve as the
common namespace. The missing __init__.py file causes an
interesting thing to happen on package import. Instead of causing an
error, the interpreter instead starts creating a list of all
directories that happen to contain a matching package name. A special
namespace package module is then created and a read-only copy of the
list of directories is stored in its __path__ variable. For
example:
>>>importspam>>>spam.__path___NamespacePath(['foo-package/spam', 'bar-package/spam'])>>>
The directories on __path__ are used when locating further package
subcomponents (e.g., when importing spam.grok or spam.blah).
An important feature of namespace packages is that anyone can extend the namespace with their own code. For example, suppose you made your own directory of code like this:
my-package/
spam/
custom.pyIf you added your directory of code to sys.path along with the
other packages, it would just seamlessly merge together with the
other spam package directories:
>>>importspam.custom>>>importspam.grok>>>importspam.blah>>>
As a debugging tool, the main way that you can tell if a package
is serving as a namespace package is to check its __file__
attribute. If it’s missing altogether, the package is a namespace.
This will also be indicated in the representation string by the
word “namespace”:
>>>spam.__file__Traceback (most recent call last):File"<stdin>", line1, in<module>AttributeError:'module' object has no attribute '__file__'>>>spam<module 'spam' (namespace)>>>>
Further information about namespace packages can be found in PEP 420.
To reload a previously loaded module, use imp.reload(). For example:
>>>importspam>>>importimp>>>imp.reload(spam)<module 'spam' from './spam.py'>>>>
Reloading a module is something that is often useful during debugging and development, but which is generally never safe in production code due to the fact that it doesn’t always work as you expect.
Under the covers, the reload() operation wipes out the contents of a module’s underlying
dictionary and refreshes it by re-executing the module’s source code. The identity of the module object
itself remains unchanged. Thus, this operation updates the module everywhere that it
has been imported in a program.
However, reload() does not update definitions that have been imported using statements
such as from module import name. To illustrate, consider the following code:
# spam.pydefbar():('bar')defgrok():('grok')
Now start an interactive session:
>>>importspam>>>fromspamimportgrok>>>spam.bar()bar>>>grok()grok>>>
Without quitting Python, go edit the source code to spam.py so that
the function grok() looks like this:
defgrok():('New grok')
Now go back to the interactive session, perform a reload, and try this experiment:
>>>importimp>>>imp.reload(spam)<module 'spam' from './spam.py'>>>>spam.bar()bar>>>grok()# Notice old outputgrok>>>spam.grok()# Notice new outputNew grok>>>
In this example, you’ll observe that there are two versions of the
grok() function loaded. Generally, this is not what you want,
and is just the sort of thing that eventually leads to massive headaches.
For this reason, reloading of modules is probably something to be avoided in production code. Save it for debugging or for interactive sessions where you’re experimenting with the interpreter and trying things out.
You have a program that has grown beyond a simple script into an application involving multiple files. You’d like to have some easy way for users to run the program.
If your application program has grown into multiple files, you can put it into its own directory and add a __main__.py file. For example, you can create a directory like this:
myapplication/
spam.py
bar.py
grok.py
__main__.pyIf __main__.py is present, you can simply run the Python interpreter on the top-level directory like this:
bash % python3 myapplication
The interpreter will execute the __main__.py file as the main program.
This technique also works if you package all of your code up into a zip file. For example:
bash % ls
spam.py bar.py grok.py __main__.py
bash % zip -r myapp.zip *.py
bash % python3 myapp.zip
... output from __main__.py ...Creating a directory or zip file and adding a __main__.py file is one possible way to package a larger Python application. It’s a little bit different than a package in that the code isn’t meant to be used as a standard library module that’s installed into the Python library. Instead, it’s just this bundle of code that you want to hand someone to execute.
Since directories and zip files are a little different than normal files, you may also want to add a supporting shell script to make execution easier. For example, if the code was in a file named myapp.zip, you could make a top-level script like this:
#!/usr/bin/env python3 /usr/local/bin/myapp.zip
Your package includes a datafile that your code needs to read. You need to do this in the most portable way possible.
Suppose you have a package with files organized as follows:
mypackage/
__init__.py
somedata.dat
spam.pyNow suppose the file spam.py wants to read the contents of the file somedata.dat. To do it, use the following code:
# spam.pyimportpkgutildata=pkgutil.get_data(__package__,'somedata.dat')
The resulting variable data will be a byte string containing the
raw contents of the file.
To read a datafile, you might be inclined to write code that uses
built-in I/O functions, such as open(). However, there are several
problems with this approach.
First, a package has very little control
over the current working directory of the interpreter. Thus, any
I/O operations would have to be programmed to use absolute filenames.
Since each module includes a __file__ variable with the full path,
it’s not impossible to figure out the location, but it’s messy.
Second, packages are often installed as .zip or .egg files,
which don’t preserve the files in the same way as a normal directory
on the filesystem. Thus, if you tried to use open() on a datafile
contained in an archive, it wouldn’t work at all.
The pkgutil.get_data() function is meant to be a high-level tool
for getting a datafile regardless of where or how a package has been
installed. It will simply “work” and return the file contents back
to you as a byte string.
The first argument to get_data() is a string containing the package
name. You can either supply it directly or use a special variable,
such as __package__. The second argument is the relative name of
the file within the package. If necessary, you can navigate into
different directories using standard Unix filename conventions as
long as the final directory is still located within the package.
You have Python code that can’t be imported because it’s not located
in a directory listed in sys.path. You would like to add new
directories to Python’s path, but don’t want to hardwire it into
your code.
There are two common ways to get new directories added to sys.path.
First, you can add them through the use of the PYTHONPATH environment
variable. For example:
bash % env PYTHONPATH=/some/dir:/other/dir python3
Python 3.3.0 (default, Oct 4 2012, 10:17:33)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path
['', '/some/dir', '/other/dir', ...]
>>>In a custom application, this environment variable could be set at program startup or through a shell script of some kind.
The second approach is to create a .pth file that lists the directories like this:
# myapplication.pth
/some/dir
/other/dirThis .pth file needs to be placed into one of Python’s site-packages directories, which are typically located at /usr/local/lib/python3.3/site-packages or ~/.local/lib/python3.3/site-packages. On interpreter startup, the directories listed in the .pth file will be added to sys.path as long as they exist on the filesystem. Installation of a .pth file might require administrator access if it’s being added to the system-wide Python interpreter.
Faced with trouble locating files, you might be inclined to write code that
manually adjusts the value of sys.path. For example:
importsyssys.path.insert(0,'/some/dir')sys.path.insert(0,'/other/dir')
Although this “works,” it is extremely fragile in practice and should be avoided if possible. Part of the problem with this approach is that it adds hardcoded directory names to your source. This can cause maintenance problems if your code ever gets moved around to a new location. It’s usually much better to configure the path elsewhere in a manner that can be adjusted without making source code edits.
You can sometimes work around the problem of hardcoded directories if
you carefully construct an appropriate absolute path using
module-level variables, such as __file__. For example:
importsysfromos.pathimportabspath,join,dirnamesys.path.insert(0,abspath(dirname('__file__'),'src'))
This adds an src directory to the path where that directory is located in the same directory as the code that’s executing the insertion step.
The site-packages directories are the locations where third-party modules and packages normally get installed. If your code was installed in that manner, that’s where it would be placed. Although .pth files for configuring the path must appear in site-packages, they can refer to any directories on the system that you wish. Thus, you can elect to have your code in a completely different set of directories as long as those directories are included in a .pth file.
You have the name of a module that you would like to import, but
it’s being held in a string. You would like to invoke the import
command on the string.
Use the importlib.import_module() function to manually import a
module or part of a package where the name is given as a string. For example:
>>>importimportlib>>>math=importlib.import_module('math')>>>math.sin(2)0.9092974268256817>>>mod=importlib.import_module('urllib.request')>>>u=mod.urlopen('http://www.python.org')>>>
import_module simply performs the same steps as import, but returns
the resulting module object back to you as a result. You just need to
store it in a variable and use it like a normal module afterward.
If you are working with packages, import_module() can also be used
to perform relative imports. However, you need to give it an extra
argument. For example:
importimportlib# Same as 'from . import b'b=importlib.import_module('.b',__package__)
The problem of manually importing modules with import_module() most
commonly arises when writing code that manipulates or wraps around
modules in some way. For example, perhaps you’re implementing a
customized importing mechanism of some kind where you need to load a
module by name and perform patches to the loaded code.
In older code, you will sometimes see the built-in __import__() function
used to perform imports. Although this works, importlib.import_module() is usually easier to use.
See Recipe 10.11 for an advanced example of customizing the import process.
You would like to customize Python’s import statement so that it can transparently load modules from a remote machine.
First, a serious disclaimer about security. The idea discussed in this
recipe would be wholly bad without some kind of extra security
and authentication layer. That said, the main goal is actually to
take a deep dive into the inner workings of Python’s import
statement. If you get this recipe to work and understand the inner
workings, you’ll have a solid foundation of customizing import for
almost any other purpose. With that out of the way, let’s carry on.
At the core of this recipe is a desire to extend the functionality of
the import statement. There are several approaches for doing this,
but for the purposes of illustration, start by making the following
directory of Python code:
testcode/
spam.py
fib.py
grok/
__init__.py
blah.pyThe content of these files doesn’t matter, but put a few simple statements and functions in each file so you can test them and see output when they’re imported. For example:
# spam.py("I'm spam")defhello(name):('Hello%s'%name)# fib.py("I'm fib")deffib(n):ifn<2:return1else:returnfib(n-1)+fib(n-2)# grok/__init__.py("I'm grok.__init__")# grok/blah.py("I'm grok.blah")
The goal here is to allow remote access to these files as modules. Perhaps the easiest way to do this is to publish them on a web server. Simply go to the testcode directory and run Python like this:
bash % cd testcode bash % python3 -m http.server 15000 Serving HTTP on 0.0.0.0 port 15000 ...
Leave that server running and start up a separate Python interpreter.
Make sure you can access the remote files using urllib. For example:
>>>fromurllib.requestimporturlopen>>>u=urlopen('http://localhost:15000/fib.py')>>>data=u.read().decode('utf-8')>>>(data)# fib.pyprint("I'm fib")def fib(n):if n < 2:return 1else:return fib(n-1) + fib(n-2)>>>
Loading source code from this server is going to form the basis for
the remainder of this recipe. Specifically, instead of manually
grabbing a file of source code using urlopen(), the import statement
will be customized to do it transparently behind the scenes.
The first approach to loading a remote module is to create an explicit loading function for doing it. For example:
importimpimporturllib.requestimportsysdefload_module(url):u=urllib.request.urlopen(url)source=u.read().decode('utf-8')mod=sys.modules.setdefault(url,imp.new_module(url))code=compile(source,url,'exec')mod.__file__=urlmod.__package__=''exec(code,mod.__dict__)returnmod
This function merely downloads the source code, compiles it into a code
object using compile(), and executes it in the dictionary of a newly
created module object. Here’s how you would use the function:
>>>fib=load_module('http://localhost:15000/fib.py')I'm fib>>>fib.fib(10)89>>>spam=load_module('http://localhost:15000/spam.py')I'm spam>>>spam.hello('Guido')Hello Guido>>>fib<module 'http://localhost:15000/fib.py' from 'http://localhost:15000/fib.py'>>>>spam<module 'http://localhost:15000/spam.py' from 'http://localhost:15000/spam.py'>>>>
As you can see, it “works” for simple modules. However, it’s not plugged into the
usual import statement, and extending the code to support more advanced constructs,
such as packages, would require additional work.
A much slicker approach is to create a custom importer. The first way to do this is to create what’s known as a meta path importer. Here is an example:
# urlimport.pyimportsysimportimportlib.abcimportimpfromurllib.requestimporturlopenfromurllib.errorimportHTTPError,URLErrorfromhtml.parserimportHTMLParser# Debuggingimportlogginglog=logging.getLogger(__name__)# Get links from a given URLdef_get_links(url):classLinkParser(HTMLParser):defhandle_starttag(self,tag,attrs):iftag=='a':attrs=dict(attrs)links.add(attrs.get('href').rstrip('/'))links=set()try:log.debug('Getting links from%s'%url)u=urlopen(url)parser=LinkParser()parser.feed(u.read().decode('utf-8'))exceptExceptionase:log.debug('Could not get links.%s',e)log.debug('links:%r',links)returnlinksclassUrlMetaFinder(importlib.abc.MetaPathFinder):def__init__(self,baseurl):self._baseurl=baseurlself._links={}self._loaders={baseurl:UrlModuleLoader(baseurl)}deffind_module(self,fullname,path=None):log.debug('find_module: fullname=%r, path=%r',fullname,path)ifpathisNone:baseurl=self._baseurlelse:ifnotpath[0].startswith(self._baseurl):returnNonebaseurl=path[0]parts=fullname.split('.')basename=parts[-1]log.debug('find_module: baseurl=%r, basename=%r',baseurl,basename)# Check link cacheifbasenamenotinself._links:self._links[baseurl]=_get_links(baseurl)# Check if it's a packageifbasenameinself._links[baseurl]:log.debug('find_module: trying package%r',fullname)fullurl=self._baseurl+'/'+basename# Attempt to load the package (which accesses __init__.py)loader=UrlPackageLoader(fullurl)try:loader.load_module(fullname)self._links[fullurl]=_get_links(fullurl)self._loaders[fullurl]=UrlModuleLoader(fullurl)log.debug('find_module: package%rloaded',fullname)exceptImportErrorase:log.debug('find_module: package failed.%s',e)loader=Nonereturnloader# A normal modulefilename=basename+'.py'iffilenameinself._links[baseurl]:log.debug('find_module: module%rfound',fullname)returnself._loaders[baseurl]else:log.debug('find_module: module%rnot found',fullname)returnNonedefinvalidate_caches(self):log.debug('invalidating link cache')self._links.clear()# Module Loader for a URLclassUrlModuleLoader(importlib.abc.SourceLoader):def__init__(self,baseurl):self._baseurl=baseurlself._source_cache={}defmodule_repr(self,module):return'<urlmodule%rfrom%r>'%(module.__name__,module.__file__)# Required methoddefload_module(self,fullname):code=self.get_code(fullname)mod=sys.modules.setdefault(fullname,imp.new_module(fullname))mod.__file__=self.get_filename(fullname)mod.__loader__=selfmod.__package__=fullname.rpartition('.')[0]exec(code,mod.__dict__)returnmod# Optional extensionsdefget_code(self,fullname):src=self.get_source(fullname)returncompile(src,self.get_filename(fullname),'exec')defget_data(self,path):passdefget_filename(self,fullname):returnself._baseurl+'/'+fullname.split('.')[-1]+'.py'defget_source(self,fullname):filename=self.get_filename(fullname)log.debug('loader: reading%r',filename)iffilenameinself._source_cache:log.debug('loader: cached%r',filename)returnself._source_cache[filename]try:u=urlopen(filename)source=u.read().decode('utf-8')log.debug('loader:%rloaded',filename)self._source_cache[filename]=sourcereturnsourceexcept(HTTPError,URLError)ase:log.debug('loader:%rfailed.%s',filename,e)raiseImportError("Can't load%s"%filename)defis_package(self,fullname):returnFalse# Package loader for a URLclassUrlPackageLoader(UrlModuleLoader):defload_module(self,fullname):mod=super().load_module(fullname)mod.__path__=[self._baseurl]mod.__package__=fullnamedefget_filename(self,fullname):returnself._baseurl+'/'+'__init__.py'defis_package(self,fullname):returnTrue# Utility functions for installing/uninstalling the loader_installed_meta_cache={}definstall_meta(address):ifaddressnotin_installed_meta_cache:finder=UrlMetaFinder(address)_installed_meta_cache[address]=findersys.meta_path.append(finder)log.debug('%rinstalled on sys.meta_path',finder)defremove_meta(address):ifaddressin_installed_meta_cache:finder=_installed_meta_cache.pop(address)sys.meta_path.remove(finder)log.debug('%rremoved from sys.meta_path',finder)
Here is an interactive session showing how to use the preceding code:
>>># importing currently fails>>>importfibTraceback (most recent call last):File"<stdin>", line1, in<module>ImportError:No module named 'fib'>>># Load the importer and retry (it works)>>>importurlimport>>>urlimport.install_meta('http://localhost:15000')>>>importfibI'm fib>>>importspamI'm spam>>>importgrok.blahI'm grok.__init__I'm grok.blah>>>grok.blah.__file__'http://localhost:15000/grok/blah.py'>>>
This particular solution involves installing an instance of a special finder object
UrlMetaFinder as the last entry in sys.meta_path.
Whenever modules
are imported, the finders in sys.meta_path are consulted in order to
locate the module. In this example, the UrlMetaFinder instance becomes a
finder of last resort that’s triggered when a module can’t be
found in any of the normal locations.
As for the general implementation approach, the UrlMetaFinder class
wraps around a user-specified URL. Internally, the finder builds sets
of valid links by scraping them from the given URL. When imports are
made, the module name is compared against this set of known links. If
a match can be found, a separate UrlModuleLoader class is used to
load source code from the remote machine and create the resulting
module object. One reason for caching the links is to avoid
unnecessary HTTP requests on repeated imports.
The second approach to customizing import is to write a hook that
plugs directly into the sys.path variable, recognizing certain
directory naming patterns. Add the following class and support
functions to urlimport.py:
# urlimport.py# ... include previous code above ...# Path finder class for a URLclassUrlPathFinder(importlib.abc.PathEntryFinder):def__init__(self,baseurl):self._links=Noneself._loader=UrlModuleLoader(baseurl)self._baseurl=baseurldeffind_loader(self,fullname):log.debug('find_loader:%r',fullname)parts=fullname.split('.')basename=parts[-1]# Check link cacheifself._linksisNone:self._links=[]# See discussionself._links=_get_links(self._baseurl)# Check if it's a packageifbasenameinself._links:log.debug('find_loader: trying package%r',fullname)fullurl=self._baseurl+'/'+basename# Attempt to load the package (which accesses __init__.py)loader=UrlPackageLoader(fullurl)try:loader.load_module(fullname)log.debug('find_loader: package%rloaded',fullname)exceptImportErrorase:log.debug('find_loader:%ris a namespace package',fullname)loader=Nonereturn(loader,[fullurl])# A normal modulefilename=basename+'.py'iffilenameinself._links:log.debug('find_loader: module%rfound',fullname)return(self._loader,[])else:log.debug('find_loader: module%rnot found',fullname)return(None,[])definvalidate_caches(self):log.debug('invalidating link cache')self._links=None# Check path to see if it looks like a URL_url_path_cache={}defhandle_url(path):ifpath.startswith(('http://','https://')):log.debug('Handle path?%s. [Yes]',path)ifpathin_url_path_cache:finder=_url_path_cache[path]else:finder=UrlPathFinder(path)_url_path_cache[path]=finderreturnfinderelse:log.debug('Handle path?%s. [No]',path)definstall_path_hook():sys.path_hooks.append(handle_url)sys.path_importer_cache.clear()log.debug('Installing handle_url')defremove_path_hook():sys.path_hooks.remove(handle_url)sys.path_importer_cache.clear()log.debug('Removing handle_url')
To use this path-based finder, you simply add URLs to sys.path. For example:
>>># Initial import fails>>>importfibTraceback (most recent call last):File"<stdin>", line1, in<module>ImportError:No module named 'fib'>>># Install the path hook>>>importurlimport>>>urlimport.install_path_hook()>>># Imports still fail (not on path)>>>importfibTraceback (most recent call last):File"<stdin>", line1, in<module>ImportError:No module named 'fib'>>># Add an entry to sys.path and watch it work>>>importsys>>>sys.path.append('http://localhost:15000')>>>importfibI'm fib>>>importgrok.blahI'm grok.__init__I'm grok.blah>>>grok.blah.__file__'http://localhost:15000/grok/blah.py'>>>
The key to this last example is the handle_url() function,
which is added to the sys.path_hooks variable. When
the entries on sys.path are being processed, the functions
in sys.path_hooks are invoked. If any of those functions return
a finder object, that finder is used to try to load modules for
that entry on sys.path.
It should be noted that the remotely imported modules work exactly like any other module. For instance:
>>>fib<urlmodule'fib'from'http://localhost:15000/fib.py'>>>>fib.__name__'fib'>>>fib.__file__'http://localhost:15000/fib.py'>>>importinspect>>>(inspect.getsource(fib))# fib.py("I'm fib")deffib(n):ifn<2:return1else:returnfib(n-1)+fib(n-2)>>>
Before discussing this recipe in further detail, it should be emphasized
that Python’s module, package, and import mechanism is one of the
most complicated parts of the entire language—often poorly understood by
even the most seasoned Python programmers unless they’ve devoted effort
to peeling back the covers. There are several critical documents
that are worth reading, including the documentation for the importlib module and PEP 302. That documentation won’t be repeated here, but
some essential highlights will be discussed.
First, if you want to create a new module object, you use
the imp.new_module() function. For example:
>>>importimp>>>m=imp.new_module('spam')>>>m<module 'spam'>>>>m.__name__'spam'>>>
Module objects usually have a few expected attributes, including
__file__ (the name of the file that the module was loaded from) and __package__ (the name of the enclosing package, if any).
Second, modules are cached by the interpreter. The module cache can
be found in the dictionary sys.modules. Because of this caching,
it’s common to combine caching and module creation together
into a single step. For example:
>>>importsys>>>importimp>>>m=sys.modules.setdefault('spam',imp.new_module('spam'))>>>m<module 'spam'>>>>
The main reason for doing this is that if a module with the given name already exists, you’ll get the already created module instead. For example:
>>>importmath>>>m=sys.modules.setdefault('math',imp.new_module('math'))>>>m<module 'math' from '/usr/local/lib/python3.3/lib-dynload/math.so'>>>>m.sin(2)0.9092974268256817>>>m.cos(2)-0.4161468365471424>>>
Since creating modules is easy, it is straightforward to write simple
functions, such as the load_module() function in the first part of
this recipe. A downside of this approach is that it is actually rather
tricky to handle more complicated cases, such as package imports. In
order to handle a package, you would have to reimplement much of the
underlying logic that’s already part of the normal import statement
(e.g., checking for directories, looking for __init__.py files,
executing those files, setting up paths, etc.). This complexity is
one of the reasons why it’s often better to extend the import statement
directly rather than defining a custom function.
Extending the import statement is straightforward, but involves a
number of moving parts. At the highest level, import operations are
processed by a list of “meta-path” finders that you can find in the
list sys.meta_path. If you output its value, you’ll see the following:
>>>frompprintimportpprint>>>pprint(sys.meta_path)[<class '_frozen_importlib.BuiltinImporter'>,<class '_frozen_importlib.FrozenImporter'>,<class '_frozen_importlib.PathFinder'>]>>>
When executing a statement such as import fib, the interpreter walks
through the finder objects on sys.meta_path and invokes their
find_module() method in order to locate an appropriate
module loader. It helps to see this by experimentation, so define the
following class and try the following:
>>>classFinder:...deffind_module(self,fullname,path):...('Looking for',fullname,path)...returnNone...>>>importsys>>>sys.meta_path.insert(0,Finder())# Insert as first entry>>>importmathLooking for math None>>>importtypesLooking for types None>>>importthreadingLooking for threading NoneLooking for time NoneLooking for traceback NoneLooking for linecache NoneLooking for tokenize NoneLooking for token None>>>
Notice how the find_module() method is being triggered on
every import. The role of the path argument in this method is to
handle packages. When packages are imported, it is a list of the
directories that are found in the package’s __path__ attribute. These
are the paths that need to be checked to find package subcomponents.
For example, notice the path setting for xml.etree and xml.etree.ElementTree:
>>>importxml.etree.ElementTreeLooking for xml NoneLooking for xml.etree ['/usr/local/lib/python3.3/xml']Looking for xml.etree.ElementTree ['/usr/local/lib/python3.3/xml/etree']Looking for warnings NoneLooking for contextlib NoneLooking for xml.etree.ElementPath ['/usr/local/lib/python3.3/xml/etree']Looking for _elementtree NoneLooking for copy NoneLooking for org NoneLooking for pyexpat NoneLooking for ElementC14N None>>>
The placement of the finder on sys.meta_path is critical. Remove
it from the front of the list to the end of the list and try
more imports:
>>>delsys.meta_path[0]>>>sys.meta_path.append(Finder())>>>importurllib.request>>>importdatetime
Now you don’t see any output because the imports are being handled
by other entries in sys.meta_path. In this case, you would
only see it trigger when nonexistent modules are imported:
>>>importfibLooking for fib NoneTraceback (most recent call last):File"<stdin>", line1, in<module>ImportError:No module named 'fib'>>>importxml.superfastLooking for xml.superfast ['/usr/local/lib/python3.3/xml']Traceback (most recent call last):File"<stdin>", line1, in<module>ImportError:No module named 'xml.superfast'>>>
The fact that you can install a finder to catch unknown modules is the key to the
UrlMetaFinder class in this recipe. An instance
of UrlMetaFinder is added to the end of sys.meta_path, where it serves as a kind
of importer of last resort. If the requested module name can’t be located by
any of the other import mechanisms, it gets handled by this finder.
Some care needs to be taken when handling packages. Specifically, the
value presented in the path argument needs to be checked to see if it starts
with the URL registered in the finder. If not, the submodule must belong to some
other finder and should be ignored.
Additional handling of packages is found in the UrlPackageLoader
class. This class, rather than importing the package name, tries to
load the underlying __init__.py file. It also sets the module
__path__ attribute. This last part is critical, as the value set
will be passed to subsequent find_module() calls when loading
package submodules.
The path-based import hook is an extension of these ideas, but based
on a somewhat different mechanism. As you know, sys.path is a list of
directories where Python looks for modules. For example:
>>>frompprintimportpprint>>>importsys>>>pprint(sys.path)['','/usr/local/lib/python33.zip','/usr/local/lib/python3.3','/usr/local/lib/python3.3/plat-darwin','/usr/local/lib/python3.3/lib-dynload','/usr/local/lib/...3.3/site-packages']>>>
Each entry in sys.path is additionally attached to a finder object.
You can view these finders by looking at sys.path_importer_cache:
>>>pprint(sys.path_importer_cache){'.': FileFinder('.'),'/usr/local/lib/python3.3': FileFinder('/usr/local/lib/python3.3'),'/usr/local/lib/python3.3/': FileFinder('/usr/local/lib/python3.3/'),'/usr/local/lib/python3.3/collections': FileFinder('...python3.3/collections'),'/usr/local/lib/python3.3/encodings': FileFinder('...python3.3/encodings'),'/usr/local/lib/python3.3/lib-dynload': FileFinder('...python3.3/lib-dynload'),'/usr/local/lib/python3.3/plat-darwin': FileFinder('...python3.3/plat-darwin'),'/usr/local/lib/python3.3/site-packages': FileFinder('...python3.3/site-packages'),'/usr/local/lib/python33.zip': None}>>>
sys.path_importer_cache tends to be much larger than sys.path because it records finders
for all known directories where code is being loaded. This includes subdirectories of
packages which usually aren’t included on sys.path.
To execute import fib, the directories on sys.path are checked in order. For each directory,
the name fib is presented to the associated finder found in sys.path_importer_cache.
This is also something that you can investigate by making your own finder and putting an entry
in the cache. Try this experiment:
>>>classFinder:...deffind_loader(self,name):...('Looking for',name)...return(None,[])...>>>importsys>>># Add a "debug" entry to the importer cache>>>sys.path_importer_cache['debug']=Finder()>>># Add a "debug" directory to sys.path>>>sys.path.insert(0,'debug')>>>importthreadingLooking for threadingLooking for timeLooking for tracebackLooking for linecacheLooking for tokenizeLooking for token>>>
Here, you’ve installed a new cache entry for the name debug and installed
the name debug as the first entry on sys.path. On all subsequent imports,
you see your finder being triggered. However, since it returns (None, []),
processing simply continues to the next entry.
The population of sys.path_importer_cache is controlled by a list of
functions stored in sys.path_hooks. Try this experiment, which clears the
cache and adds a new path checking function to sys.path_hooks:
>>>sys.path_importer_cache.clear()>>>defcheck_path(path):...('Checking',path)...raiseImportError()...>>>sys.path_hooks.insert(0,check_path)>>>importfibChecked debugChecking .Checking /usr/local/lib/python33.zipChecking /usr/local/lib/python3.3Checking /usr/local/lib/python3.3/plat-darwinChecking /usr/local/lib/python3.3/lib-dynloadChecking /Users/beazley/.local/lib/python3.3/site-packagesChecking /usr/local/lib/python3.3/site-packagesLooking for fibTraceback (most recent call last):File"<stdin>", line1, in<module>ImportError:No module named 'fib'>>>
As you can see, the check_path() function is being invoked for every
entry on sys.path. However, since an ImportError exception is raised,
nothing else happens (checking just moves to the next function on sys.path_hooks).
Using this knowledge of how sys.path is processed, you can install a custom
path checking function that looks for filename patterns, such as URLs. For
instance:
>>>defcheck_url(path):...ifpath.startswith('http://'):...returnFinder()...else:...raiseImportError()...>>>sys.path.append('http://localhost:15000')>>>sys.path_hooks[0]=check_url>>>importfibLooking for fib # Finder output!Traceback (most recent call last):File"<stdin>", line1, in<module>ImportError:No module named 'fib'>>># Notice installation of Finder in sys.path_importer_cache>>>sys.path_importer_cache['http://localhost:15000']<__main__.Finder object at 0x10064c850>>>>
This is the key mechanism at work in the last part of this recipe.
Essentially, a custom path checking function has been installed that
looks for URLs in sys.path. When they are encountered, a new
UrlPathFinder instance is created and installed into
sys.path_importer_cache. From that point forward, all import
statements that pass through that part of sys.path will try to use
your custom finder.
Package handling with a path-based importer is somewhat tricky, and relates
to the return value of the find_loader() method. For simple modules,
find_loader() returns a tuple (loader, None) where loader is an
instance of a loader that will import the module.
For a normal package, find_loader() returns a tuple (loader, path)
where loader is the loader instance that will import the package
(and execute __init__.py) and path is a list of the directories that
will make up the initial setting of the package’s __path__ attribute.
For example, if the base URL was http://localhost:15000 and
a user executed import grok, the path returned by find_loader()
would be [ 'http://localhost:15000/grok' ].
The find_loader() must additionally account for the possibility of a
namespace package. A namespace package is a package where a valid
package directory name exists, but no __init__.py file can be found.
For this case, find_loader() must return a tuple (None, path)
where path is a list of directories that would have made up the
package’s __path__ attribute had it defined an __init__.py file.
For this case, the import mechanism moves on to check further
directories on sys.path. If more namespace packages are found, all
of the resulting paths are joined together to make a final namespace
package. See Recipe 10.5 for more information on namespace packages.
There is a recursive element to package handling that is not
immediately obvious in the solution, but also at work. All packages
contain an internal path setting, which can be found in __path__
attribute. For example:
>>>importxml.etree.ElementTree>>>xml.__path__['/usr/local/lib/python3.3/xml']>>>xml.etree.__path__['/usr/local/lib/python3.3/xml/etree']>>>
As mentioned, the setting of __path__ is controlled by the return
value of the find_loader() method. However, the subsequent processing of
__path__ is also handled by the functions in sys.path_hooks.
Thus, when package subcomponents are loaded, the entries in __path__ are
checked by the handle_url() function. This causes new instances of
UrlPathFinder to be created and added to sys.path_importer_cache.
One remaining tricky part of the implementation concerns the
behavior of the handle_url() function and its interaction with the
_get_links() function used internally. If your implementation of a
finder involves the use of other modules (e.g., urllib.request),
there is a possibility that those modules will attempt to make further
imports in the middle of the finder’s operation. This can actually
cause handle_url() and other parts of the finder to get executed in a kind of recursive loop. To
account for this possibility, the implementation maintains a cache of
created finders (one per URL). This avoids the problem of creating duplicate
finders. In addition, the following fragment of code ensures that the finder
doesn’t respond to any import requests while it’s in the processs
of getting the initial set of links:
# Check link cacheifself._linksisNone:self._links=[]# See discussionself._links=_get_links(self._baseurl)
You may not need this checking in other implementations, but for this example involving URLs, it was required.
Finally, the invalidate_caches() method of both finders is a utility method
that is supposed to clear internal caches should the source code change. This
method is triggered when a user invokes importlib.invalidate_caches(). You
might use it if you want the URL importers to reread the list of links,
possibly for the purpose of being able to access newly added files.
In comparing the two approaches (modifying sys.meta_path or using a
path hook), it helps to take a high-level view. Importers installed
using sys.meta_path are free to handle modules in any manner that
they wish. For instance, they could load modules out of a database or
import them in a manner that is radically different than normal
module/package handling. This freedom also means that such importers
need to do more bookkeeping and internal management. This explains,
for instance, why the implementation of UrlMetaFinder needs to do
its own caching of links, loaders, and other details. On the other
hand, path-based hooks are more narrowly tied to the processing of
sys.path. Because of the connection to sys.path, modules
loaded with such extensions will tend to have the same features as normal
modules and packages that programmers are used to.
Assuming that your head hasn’t completely exploded at this point, a key to understanding and experimenting with this recipe may be the added logging calls. You can enable logging and try experiments such as this:
>>>importlogging>>>logging.basicConfig(level=logging.DEBUG)>>>importurlimport>>>urlimport.install_path_hook()DEBUG:urlimport:Installing handle_url>>>importfibDEBUG:urlimport:Handle path? /usr/local/lib/python33.zip. [No]Traceback (most recent call last):File"<stdin>", line1, in<module>ImportError:No module named 'fib'>>>importsys>>>sys.path.append('http://localhost:15000')>>>importfibDEBUG:urlimport:Handle path? http://localhost:15000. [Yes]DEBUG:urlimport:Getting links from http://localhost:15000DEBUG:urlimport:links: {'spam.py', 'fib.py', 'grok'}DEBUG:urlimport:find_loader: 'fib'DEBUG:urlimport:find_loader: module 'fib' foundDEBUG:urlimport:loader: reading 'http://localhost:15000/fib.py'DEBUG:urlimport:loader: 'http://localhost:15000/fib.py' loadedI'm fib>>>
Last, but not least, spending some time sleeping with PEP 302 and the
documentation for importlib under your pillow may be advisable.
You want to patch or apply decorators to functions in an existing module. However, you only want to do it if the module actually gets imported and used elsewhere.
The essential problem here is that you would like to carry out actions in response to a module being loaded. Perhaps you want to trigger some kind of callback function that would notify you when a module was loaded.
This problem can be solved using the same import hook machinery discussed in Recipe 10.11. Here is a possible solution:
# postimport.pyimportimportlibimportsysfromcollectionsimportdefaultdict_post_import_hooks=defaultdict(list)classPostImportFinder:def__init__(self):self._skip=set()deffind_module(self,fullname,path=None):iffullnameinself._skip:returnNoneself._skip.add(fullname)returnPostImportLoader(self)classPostImportLoader:def__init__(self,finder):self._finder=finderdefload_module(self,fullname):importlib.import_module(fullname)module=sys.modules[fullname]forfuncin_post_import_hooks[fullname]:func(module)self._finder._skip.remove(fullname)returnmoduledefwhen_imported(fullname):defdecorate(func):iffullnameinsys.modules:func(sys.modules[fullname])else:_post_import_hooks[fullname].append(func)returnfuncreturndecoratesys.meta_path.insert(0,PostImportFinder())
To use this code, you use the when_imported() decorator. For example:
>>>frompostimportimportwhen_imported>>>@when_imported('threading')...defwarn_threads(mod):...('Threads? Are you crazy?')...>>>>>>importthreadingThreads?Areyoucrazy?>>>
As a more practical example, maybe you want to apply decorators to existing definitions, such as shown here:
fromfunctoolsimportwrapsfrompostimportimportwhen_importeddeflogged(func):@wraps(func)defwrapper(*args,**kwargs):('Calling',func.__name__,args,kwargs)returnfunc(*args,**kwargs)returnwrapper# Example@when_imported('math')defadd_logging(mod):mod.cos=logged(mod.cos)mod.sin=logged(mod.sin)
This recipe relies on the import hooks that were discussed in Recipe 10.11, with a slight twist.
First, the role of the @when_imported decorator is to register
handler functions that get triggered on import. The decorator
checks sys.modules to see if a module was already loaded. If so,
the handler is invoked immediately. Otherwise, the handler is added
to a list in the _post_import_hooks dictionary. The purpose
of _post_import_hooks is simply to collect all handler objects
that have been registered for each module. In principle, more than
one handler could be registered for a given module.
To trigger the pending actions in _post_import_hooks after module
import, the PostImportFinder class is installed as the first item in
sys.meta_path. If you recall from Recipe 10.11,
sys.meta_path contains a list of finder objects that are consulted
in order to locate modules. By installing PostImportFinder as the
first item, it captures all module imports.
In this recipe, however, the role of PostImportFinder is not to load
modules, but to trigger actions upon the completion of an import. To
do this, the actual import is delegated to the other finders on
sys.meta_path. Rather than trying to do this directly, the function
imp.import_module() is called recursively in the PostImportLoader
class. To avoid getting stuck in an infinite loop, PostImportFinder
keeps a set of all the modules that are currently in the process of
being loaded. If a module name is part of this set, it is simply
ignored by PostImportFinder. This is what causes the import request
to pass to the other finders on sys.meta_path.
After a module has been loaded with imp.import_module(), all
handlers currently registered in _post_import_hooks are called
with the newly loaded module as an argument. From this point forward,
the handlers are free to do what they want with the module.
A major feature of the approach shown in this recipe is that the
patching of a module occurs in a seamless fashion, regardless of
where or how a module of interest is actually loaded. You simply write
a handler function that’s decorated with @when_imported() and it
all just magically works from that point forward.
One caution about this recipe is that it does not work for modules that have been
explicitly reloaded using imp.reload(). That is, if you reload a previously loaded
module, the post import handler function doesn’t get triggered again (all the
more reason to not use reload() in production code). On the other hand,
if you delete the module from sys.modules and redo the import, you’ll see
the handler trigger again.
More information about post-import hooks can be found in PEP 369 . As of
this writing, the PEP has been withdrawn by the author due to it being
out of date with the current implementation of the importlib module.
However, it is easy enough to implement your own solution using this
recipe.
You want to install a third-party package, but you don’t have permission to install packages into the system Python. Alternatively, perhaps you just want to install a package for your own use, not all users on the system.
Python has a per-user installation directory that’s typically located in a directory
such as ~/.local/lib/python3.3/site-packages. To force packages to install in this
directory, give the --user option to the installation command. For example:
python3 setup.py install --user
or
pip install --user packagename
The user site-packages directory normally appears before the system site-packages directory on
sys.path. Thus, packages you install using this technique take priority over the
packages already installed on the system (although this is not always the case depending
on the behavior of third-party package managers, such as distribute or pip).
Normally, packages get installed into the system-wide site-packages
directory, which is found in a location such as
/usr/local/lib/python3.3/site-packages. However, doing so typically
requires administrator permissions and use of the sudo command.
Even if you have permission to execute such a command, using sudo to
install a new, possibly unproven, package might give you some pause.
Installing packages into the per-user directory is often an effective workaround that allows you to create a custom installation.
As an alternative, you can also create a virtual environment, which is discussed in the next recipe.
You want to create a new Python environment in which you can install modules and packages. However, you want to do this without installing a new copy of Python or making changes that might affect the system Python installation.
You can make a new “virtual” environment using the pyvenv command. This command is installed
in the same directory as the Python interpreter or possibly in the Scripts directory on Windows.
Here is an example:
bash % pyvenv Spam
bash %The name supplied to pyvenv is the name of a directory that will be created. Upon creation, the
Spam directory will look something like this:
bash % cd Spam
bash % ls
bin include lib pyvenv.cfg
bash %In the bin directory, you’ll find a Python interpreter that you can use. For example:
bash % Spam/bin/python3
Python 3.3.0 (default, Oct 6 2012, 15:45:22)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from pprint import pprint
>>> import sys
>>> pprint(sys.path)
['',
'/usr/local/lib/python33.zip',
'/usr/local/lib/python3.3',
'/usr/local/lib/python3.3/plat-darwin',
'/usr/local/lib/python3.3/lib-dynload',
'/Users/beazley/Spam/lib/python3.3/site-packages']
>>>A key feature of this interpreter is that its site-packages directory has been set to the newly created environment. Should you decide to install third-party packages, they will be installed here, not in the normal system site-packages directory.
The creation of a virtual environment mostly pertains to the
installation and management of third-party packages. As you can see
in the example, the sys.path variable contains directories from the
normal system Python, but the site-packages directory has been
relocated to a new directory.
With a new virtual environment, the next step is often to install a
package manager, such as distribute or pip. When installing such
tools and subsequent packages, you just need to make sure you use the
interpreter that’s part of the virtual environment. This should install
the packages into the newly created site-packages directory.
Although a virtual environment might look like a copy of the Python installation, it really only consists of a few files and symbolic links. All of the standard library files and interpreter executables come from the original Python installation. Thus, creating such environments is easy, and takes almost no machine resources.
By default, virtual environments are completely clean and contain no third-party
add-ons. If you would like to include already installed packages as part of
a virtual environment, create the environment using the --system-site-packages
option. For example:
bash % pyvenv --system-site-packages Spam
bash %More information about pyvenv and virtual environments can be found in PEP 405.
If you’re going to start giving code away, the first thing to do is to give it a unique name and clean up its directory structure. For example, a typical library package might look something like this:
projectname/
README.txt
Doc/
documentation.txt
projectname/
__init__.py
foo.py
bar.py
utils/
__init__.py
spam.py
grok.py
examples/
helloworld.py
...To make the package something that you can distribute, first write a setup.py file that looks like this:
# setup.pyfrom distutils.core import setupsetup(name='projectname',version='1.0',author='Your Name',author_email='you@youraddress.com',url='http://www.you.com/projectname',packages=['projectname', 'projectname.utils'],)
Next, make a file MANIFEST.in that lists various nonsource files that you want to include in your package:
# MANIFEST.in
include *.txt
recursive-include examples *
recursive-include Doc *Make sure the setup.py and MANIFEST.in files appear in the top-level directory of your package. Once you have done this, you should be able to make a source distribution by typing a command such as this:
% bash python3 setup.py sdist
This will create a file such as projectname-1.0.zip or projectname-1.0.tar.gz, depending on the platform. If it all works, this file is suitable for giving to others or uploading to the Python Package Index.
For pure Python code, writing a plain setup.py file is usually
straightforward. One potential gotcha is that you have to manually
list every subdirectory that makes up the packages source code. A
common mistake is to only list the top-level directory of a package
and to forget to include package subcomponents. This is why the
specification for packages in setup.py includes the list
packages=['projectname', 'projectname.utils'].
As most Python programmers know, there are many third-party packaging
options, including setuptools, distribute, and so forth. Some
of these are replacements for the distutils library found in the
standard library. Be aware that if you rely on these packages, users
may not be able to install your software unless they also install the
required package manager first. Because of this, you can almost never
go wrong by keeping things as simple as possible. At a bare minimum, make
sure your code can be installed using a standard Python 3 installation.
Additional features can be supported as an option if additional packages
are available.
Packaging and distribution of code involving C extensions can get considerably more complicated. Chapter 15 on C extensions has a few details on this. In particular, see Recipe 15.2.